Dataset statistics
| Dataset A | Dataset B | |
|---|---|---|
| Number of variables | 12 | 12 |
| Number of observations | 446 | 446 |
| Missing cells | 420 | 431 |
| Missing cells (%) | 7.8% | 8.1% |
| Duplicate rows | 0 | 0 |
| Duplicate rows (%) | 0.0% | 0.0% |
| Total size in memory | 45.3 KiB | 45.3 KiB |
| Average record size in memory | 104.0 B | 104.0 B |
Variable types
| Dataset A | Dataset B | |
|---|---|---|
| Numeric | 5 | 5 |
| Categorical | 4 | 4 |
| Text | 3 | 3 |
| Dataset A | Dataset B | |
|---|---|---|
Survived is highly overall correlated with Sex | Survived is highly overall correlated with Sex | High Correlation |
Sex is highly overall correlated with Survived | Sex is highly overall correlated with Survived | High Correlation |
Age has 81 (18.2%) missing values | Age has 81 (18.2%) missing values | Missing |
Cabin has 339 (76.0%) missing values | Cabin has 349 (78.3%) missing values | Missing |
PassengerId has unique values | PassengerId has unique values | Unique |
Name has unique values | Name has unique values | Unique |
SibSp has 307 (68.8%) zeros | SibSp has 307 (68.8%) zeros | Zeros |
Parch has 335 (75.1%) zeros | Parch has 346 (77.6%) zeros | Zeros |
Fare has 5 (1.1%) zeros | Fare has 7 (1.6%) zeros | Zeros |
Reproduction
| Dataset A | Dataset B | |
|---|---|---|
| Analysis started | 2023-07-19 17:52:21.133129 | 2023-07-19 17:52:25.277918 |
| Analysis finished | 2023-07-19 17:52:25.276663 | 2023-07-19 17:52:29.307974 |
| Duration | 4.14 seconds | 4.03 seconds |
| Software version | ydata-profiling v0.0.dev0 | ydata-profiling v0.0.dev0 |
| Download configuration | config.json | config.json |
PassengerId
Real number (ℝ)
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 446 | 446 |
| Distinct (%) | 100.0% | 100.0% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Infinite | 0 | 0 |
| Infinite (%) | 0.0% | 0.0% |
| Mean | 439.12332 | 432.13453 |
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 1 | 3 |
| Maximum | 891 | 891 |
| Zeros | 0 | 0 |
| Zeros (%) | 0.0% | 0.0% |
| Negative | 0 | 0 |
| Negative (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Quantile statistics
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 1 | 3 |
| 5-th percentile | 41.25 | 45.25 |
| Q1 | 220.5 | 200 |
| median | 432.5 | 429.5 |
| Q3 | 655.75 | 663.5 |
| 95-th percentile | 847.25 | 848 |
| Maximum | 891 | 891 |
| Range | 890 | 888 |
| Interquartile range (IQR) | 435.25 | 463.5 |
Descriptive statistics
| Dataset A | Dataset B | |
|---|---|---|
| Standard deviation | 255.48674 | 261.12986 |
| Coefficient of variation (CV) | 0.58181092 | 0.60427909 |
| Kurtosis | -1.1830659 | -1.2224571 |
| Mean | 439.12332 | 432.13453 |
| Median Absolute Deviation (MAD) | 216.5 | 232 |
| Skewness | 0.071724899 | 0.084067407 |
| Sum | 195849 | 192732 |
| Variance | 65273.475 | 68188.804 |
| Monotonicity | Not monotonic | Not monotonic |
| Value | Count | Frequency (%) |
| 780 | 1 | 0.2% |
| 82 | 1 | 0.2% |
| 308 | 1 | 0.2% |
| 219 | 1 | 0.2% |
| 49 | 1 | 0.2% |
| 91 | 1 | 0.2% |
| 383 | 1 | 0.2% |
| 112 | 1 | 0.2% |
| 766 | 1 | 0.2% |
| 249 | 1 | 0.2% |
| Other values (436) | 436 |
| Value | Count | Frequency (%) |
| 614 | 1 | 0.2% |
| 158 | 1 | 0.2% |
| 168 | 1 | 0.2% |
| 608 | 1 | 0.2% |
| 556 | 1 | 0.2% |
| 136 | 1 | 0.2% |
| 832 | 1 | 0.2% |
| 148 | 1 | 0.2% |
| 10 | 1 | 0.2% |
| 881 | 1 | 0.2% |
| Other values (436) | 436 |
| Value | Count | Frequency (%) |
| 1 | 1 | |
| 3 | 1 | |
| 8 | 1 | |
| 10 | 1 | |
| 12 | 1 | |
| 13 | 1 | |
| 16 | 1 | |
| 17 | 1 | |
| 18 | 1 | |
| 20 | 1 |
| Value | Count | Frequency (%) |
| 3 | 1 | |
| 4 | 1 | |
| 8 | 1 | |
| 10 | 1 | |
| 11 | 1 | |
| 13 | 1 | |
| 16 | 1 | |
| 18 | 1 | |
| 21 | 1 | |
| 22 | 1 |
| Value | Count | Frequency (%) |
| 3 | 1 | |
| 4 | 1 | |
| 8 | 1 | |
| 10 | 1 | |
| 11 | 1 | |
| 13 | 1 | |
| 16 | 1 | |
| 18 | 1 | |
| 21 | 1 | |
| 22 | 1 |
| Value | Count | Frequency (%) |
| 1 | 1 | |
| 3 | 1 | |
| 8 | 1 | |
| 10 | 1 | |
| 12 | 1 | |
| 13 | 1 | |
| 16 | 1 | |
| 17 | 1 | |
| 18 | 1 | |
| 20 | 1 |
Survived
Categorical
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 2 | 2 |
| Distinct (%) | 0.4% | 0.4% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
| 0 | |
|---|---|
| 1 |
| 0 | |
|---|---|
| 1 |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 1 | 1 |
| Median length | 1 | 1 |
| Mean length | 1 | 1 |
| Min length | 1 | 1 |
Characters and Unicode
| Dataset A | Dataset B | |
|---|---|---|
| Total characters | 446 | 446 |
| Distinct characters | 2 | 2 |
| Distinct categories | 1 | 1 ? |
| Distinct scripts | 1 | 1 ? |
| Distinct blocks | 1 | 1 ? |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 0 | 0 ? |
| Unique (%) | 0.0% | 0.0% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | 1 | 0 |
| 2nd row | 1 | 0 |
| 3rd row | 0 | 0 |
| 4th row | 1 | 1 |
| 5th row | 0 | 1 |
Common Values
| Value | Count | Frequency (%) |
| 0 | 269 | |
| 1 | 177 |
| Value | Count | Frequency (%) |
| 0 | 262 | |
| 1 | 184 |
Length
Common Values (Plot)
Dataset A
Dataset B
| Value | Count | Frequency (%) |
| 0 | 269 | |
| 1 | 177 |
| Value | Count | Frequency (%) |
| 0 | 262 | |
| 1 | 184 |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 269 | |
| 1 | 177 |
| Value | Count | Frequency (%) |
| 0 | 262 | |
| 1 | 184 |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 446 |
| Value | Count | Frequency (%) |
| Decimal Number | 446 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 269 | |
| 1 | 177 |
| Value | Count | Frequency (%) |
| 0 | 262 | |
| 1 | 184 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 446 |
| Value | Count | Frequency (%) |
| Common | 446 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 0 | 269 | |
| 1 | 177 |
| Value | Count | Frequency (%) |
| 0 | 262 | |
| 1 | 184 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 446 |
| Value | Count | Frequency (%) |
| ASCII | 446 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 0 | 269 | |
| 1 | 177 |
| Value | Count | Frequency (%) |
| 0 | 262 | |
| 1 | 184 |
Pclass
Categorical
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 3 | 3 |
| Distinct (%) | 0.7% | 0.7% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
| 3 | |
|---|---|
| 1 | |
| 2 |
| 3 | |
|---|---|
| 1 | |
| 2 |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 1 | 1 |
| Median length | 1 | 1 |
| Mean length | 1 | 1 |
| Min length | 1 | 1 |
Characters and Unicode
| Dataset A | Dataset B | |
|---|---|---|
| Total characters | 446 | 446 |
| Distinct characters | 3 | 3 |
| Distinct categories | 1 | 1 ? |
| Distinct scripts | 1 | 1 ? |
| Distinct blocks | 1 | 1 ? |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 0 | 0 ? |
| Unique (%) | 0.0% | 0.0% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | 1 | 3 |
| 2nd row | 3 | 1 |
| 3rd row | 3 | 1 |
| 4th row | 1 | 3 |
| 5th row | 2 | 3 |
Common Values
| Value | Count | Frequency (%) |
| 3 | 246 | |
| 1 | 118 | |
| 2 | 82 | 18.4% |
| Value | Count | Frequency (%) |
| 3 | 244 | |
| 1 | 104 | |
| 2 | 98 |
Length
Common Values (Plot)
Dataset A
Dataset B
| Value | Count | Frequency (%) |
| 3 | 246 | |
| 1 | 118 | |
| 2 | 82 | 18.4% |
| Value | Count | Frequency (%) |
| 3 | 244 | |
| 1 | 104 | |
| 2 | 98 |
Most occurring characters
| Value | Count | Frequency (%) |
| 3 | 246 | |
| 1 | 118 | |
| 2 | 82 | 18.4% |
| Value | Count | Frequency (%) |
| 3 | 244 | |
| 1 | 104 | |
| 2 | 98 |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 446 |
| Value | Count | Frequency (%) |
| Decimal Number | 446 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 3 | 246 | |
| 1 | 118 | |
| 2 | 82 | 18.4% |
| Value | Count | Frequency (%) |
| 3 | 244 | |
| 1 | 104 | |
| 2 | 98 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 446 |
| Value | Count | Frequency (%) |
| Common | 446 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 3 | 246 | |
| 1 | 118 | |
| 2 | 82 | 18.4% |
| Value | Count | Frequency (%) |
| 3 | 244 | |
| 1 | 104 | |
| 2 | 98 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 446 |
| Value | Count | Frequency (%) |
| ASCII | 446 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 3 | 246 | |
| 1 | 118 | |
| 2 | 82 | 18.4% |
| Value | Count | Frequency (%) |
| 3 | 244 | |
| 1 | 104 | |
| 2 | 98 |
Name
['Text', 'Text']
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 446 | 446 |
| Distinct (%) | 100.0% | 100.0% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 82 | 82 |
| Median length | 49 | 49.5 |
| Mean length | 26.912556 | 27.006726 |
| Min length | 12 | 12 |
Characters and Unicode
| Dataset A | Dataset B | |
|---|---|---|
| Total characters | 12003 | 12045 |
| Distinct characters | 59 | 59 |
| Distinct categories | 7 | 7 ? |
| Distinct scripts | 2 | 2 ? |
| Distinct blocks | 1 | 1 ? |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 446 | 446 ? |
| Unique (%) | 100.0% | 100.0% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | Robert, Mrs. Edward Scott (Elisabeth Walton McMillan) | Horgan, Mr. John |
| 2nd row | Coutts, Master. William Loch "William" | Weir, Col. John |
| 3rd row | Vande Velde, Mr. Johannes Joseph | Lewy, Mr. Ervin G |
| 4th row | Baxter, Mrs. James (Helene DeLaudeniere Chaput) | Madigan, Miss. Margaret "Maggie" |
| 5th row | Slemen, Mr. Richard James | Osman, Mrs. Mara |
| Value | Count | Frequency (%) |
| mr | 251 | 13.8% |
| miss | 96 | 5.3% |
| mrs | 60 | 3.3% |
| william | 31 | 1.7% |
| master | 26 | 1.4% |
| john | 24 | 1.3% |
| henry | 17 | 0.9% |
| charles | 14 | 0.8% |
| james | 12 | 0.7% |
| thomas | 12 | 0.7% |
| Other values (886) | 1270 |
| Value | Count | Frequency (%) |
| mr | 248 | 13.7% |
| miss | 99 | 5.5% |
| mrs | 66 | 3.6% |
| william | 35 | 1.9% |
| john | 28 | 1.5% |
| master | 23 | 1.3% |
| henry | 18 | 1.0% |
| charles | 13 | 0.7% |
| james | 12 | 0.7% |
| mary | 11 | 0.6% |
| Other values (896) | 1257 |
Most occurring characters
| Value | Count | Frequency (%) |
| 1369 | 11.4% | |
| r | 959 | 8.0% |
| a | 853 | 7.1% |
| e | 815 | 6.8% |
| i | 682 | 5.7% |
| s | 644 | 5.4% |
| n | 611 | 5.1% |
| M | 559 | 4.7% |
| l | 532 | 4.4% |
| o | 515 | 4.3% |
| Other values (49) | 4464 |
| Value | Count | Frequency (%) |
| 1366 | 11.3% | |
| r | 944 | 7.8% |
| e | 871 | 7.2% |
| a | 850 | 7.1% |
| i | 699 | 5.8% |
| s | 677 | 5.6% |
| n | 652 | 5.4% |
| M | 564 | 4.7% |
| l | 543 | 4.5% |
| o | 493 | 4.1% |
| Other values (49) | 4386 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 7724 | |
| Uppercase Letter | 1821 | 15.2% |
| Space Separator | 1369 | 11.4% |
| Other Punctuation | 949 | 7.9% |
| Close Punctuation | 67 | 0.6% |
| Open Punctuation | 67 | 0.6% |
| Dash Punctuation | 6 | < 0.1% |
| Value | Count | Frequency (%) |
| Lowercase Letter | 7747 | |
| Uppercase Letter | 1817 | 15.1% |
| Space Separator | 1366 | 11.3% |
| Other Punctuation | 958 | 8.0% |
| Open Punctuation | 76 | 0.6% |
| Close Punctuation | 76 | 0.6% |
| Dash Punctuation | 5 | < 0.1% |
Most frequent character per category
Space Separator
| Value | Count | Frequency (%) |
| 1369 |
| Value | Count | Frequency (%) |
| 1366 |
Lowercase Letter
| Value | Count | Frequency (%) |
| r | 959 | |
| a | 853 | |
| e | 815 | |
| i | 682 | |
| s | 644 | |
| n | 611 | |
| l | 532 | 6.9% |
| o | 515 | 6.7% |
| t | 360 | 4.7% |
| h | 271 | 3.5% |
| Other values (16) | 1482 |
| Value | Count | Frequency (%) |
| r | 944 | |
| e | 871 | |
| a | 850 | |
| i | 699 | |
| s | 677 | |
| n | 652 | |
| l | 543 | 7.0% |
| o | 493 | 6.4% |
| t | 334 | 4.3% |
| h | 260 | 3.4% |
| Other values (16) | 1424 |
Uppercase Letter
| Value | Count | Frequency (%) |
| M | 559 | |
| A | 136 | 7.5% |
| H | 102 | 5.6% |
| J | 100 | 5.5% |
| E | 94 | 5.2% |
| C | 89 | 4.9% |
| S | 85 | 4.7% |
| W | 75 | 4.1% |
| B | 73 | 4.0% |
| L | 60 | 3.3% |
| Other values (15) | 448 |
| Value | Count | Frequency (%) |
| M | 564 | |
| A | 120 | 6.6% |
| J | 115 | 6.3% |
| H | 101 | 5.6% |
| S | 88 | 4.8% |
| B | 83 | 4.6% |
| C | 81 | 4.5% |
| W | 72 | 4.0% |
| L | 71 | 3.9% |
| E | 70 | 3.9% |
| Other values (15) | 452 |
Other Punctuation
| Value | Count | Frequency (%) |
| . | 447 | |
| , | 446 | |
| " | 54 | 5.7% |
| ' | 2 | 0.2% |
| Value | Count | Frequency (%) |
| , | 446 | |
| . | 446 | |
| " | 60 | 6.3% |
| ' | 6 | 0.6% |
Close Punctuation
| Value | Count | Frequency (%) |
| ) | 67 |
| Value | Count | Frequency (%) |
| ) | 76 |
Open Punctuation
| Value | Count | Frequency (%) |
| ( | 67 |
| Value | Count | Frequency (%) |
| ( | 76 |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 6 |
| Value | Count | Frequency (%) |
| - | 5 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 9545 | |
| Common | 2458 | 20.5% |
| Value | Count | Frequency (%) |
| Latin | 9564 | |
| Common | 2481 | 20.6% |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 1369 | ||
| . | 447 | 18.2% |
| , | 446 | 18.1% |
| ) | 67 | 2.7% |
| ( | 67 | 2.7% |
| " | 54 | 2.2% |
| - | 6 | 0.2% |
| ' | 2 | 0.1% |
| Value | Count | Frequency (%) |
| 1366 | ||
| , | 446 | 18.0% |
| . | 446 | 18.0% |
| ( | 76 | 3.1% |
| ) | 76 | 3.1% |
| " | 60 | 2.4% |
| ' | 6 | 0.2% |
| - | 5 | 0.2% |
Latin
| Value | Count | Frequency (%) |
| r | 959 | 10.0% |
| a | 853 | 8.9% |
| e | 815 | 8.5% |
| i | 682 | 7.1% |
| s | 644 | 6.7% |
| n | 611 | 6.4% |
| M | 559 | 5.9% |
| l | 532 | 5.6% |
| o | 515 | 5.4% |
| t | 360 | 3.8% |
| Other values (41) | 3015 |
| Value | Count | Frequency (%) |
| r | 944 | 9.9% |
| e | 871 | 9.1% |
| a | 850 | 8.9% |
| i | 699 | 7.3% |
| s | 677 | 7.1% |
| n | 652 | 6.8% |
| M | 564 | 5.9% |
| l | 543 | 5.7% |
| o | 493 | 5.2% |
| t | 334 | 3.5% |
| Other values (41) | 2937 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 12003 |
| Value | Count | Frequency (%) |
| ASCII | 12045 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 1369 | 11.4% | |
| r | 959 | 8.0% |
| a | 853 | 7.1% |
| e | 815 | 6.8% |
| i | 682 | 5.7% |
| s | 644 | 5.4% |
| n | 611 | 5.1% |
| M | 559 | 4.7% |
| l | 532 | 4.4% |
| o | 515 | 4.3% |
| Other values (49) | 4464 |
| Value | Count | Frequency (%) |
| 1366 | 11.3% | |
| r | 944 | 7.8% |
| e | 871 | 7.2% |
| a | 850 | 7.1% |
| i | 699 | 5.8% |
| s | 677 | 5.6% |
| n | 652 | 5.4% |
| M | 564 | 4.7% |
| l | 543 | 4.5% |
| o | 493 | 4.1% |
| Other values (49) | 4386 |
Sex
Categorical
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 2 | 2 |
| Distinct (%) | 0.4% | 0.4% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
| male | |
|---|---|
| female |
| male | |
|---|---|
| female |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 6 | 6 |
| Median length | 4 | 4 |
| Mean length | 4.7040359 | 4.7488789 |
| Min length | 4 | 4 |
Characters and Unicode
| Dataset A | Dataset B | |
|---|---|---|
| Total characters | 2098 | 2118 |
| Distinct characters | 5 | 5 |
| Distinct categories | 1 | 1 ? |
| Distinct scripts | 1 | 1 ? |
| Distinct blocks | 1 | 1 ? |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 0 | 0 ? |
| Unique (%) | 0.0% | 0.0% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | female | male |
| 2nd row | male | male |
| 3rd row | male | male |
| 4th row | female | female |
| 5th row | male | female |
Common Values
| Value | Count | Frequency (%) |
| male | 289 | |
| female | 157 |
| Value | Count | Frequency (%) |
| male | 279 | |
| female | 167 |
Length
Common Values (Plot)
Dataset A
Dataset B
| Value | Count | Frequency (%) |
| male | 289 | |
| female | 157 |
| Value | Count | Frequency (%) |
| male | 279 | |
| female | 167 |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 603 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 157 | 7.5% |
| Value | Count | Frequency (%) |
| e | 613 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 167 | 7.9% |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 2098 |
| Value | Count | Frequency (%) |
| Lowercase Letter | 2118 |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| e | 603 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 157 | 7.5% |
| Value | Count | Frequency (%) |
| e | 613 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 167 | 7.9% |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 2098 |
| Value | Count | Frequency (%) |
| Latin | 2118 |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| e | 603 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 157 | 7.5% |
| Value | Count | Frequency (%) |
| e | 613 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 167 | 7.9% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 2098 |
| Value | Count | Frequency (%) |
| ASCII | 2118 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| e | 603 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 157 | 7.5% |
| Value | Count | Frequency (%) |
| e | 613 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 167 | 7.9% |
Age
Real number (ℝ)
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 76 | 73 |
| Distinct (%) | 20.8% | 20.0% |
| Missing | 81 | 81 |
| Missing (%) | 18.2% | 18.2% |
| Infinite | 0 | 0 |
| Infinite (%) | 0.0% | 0.0% |
| Mean | 29.678329 | 29.222137 |
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0.42 | 0.83 |
| Maximum | 80 | 80 |
| Zeros | 0 | 0 |
| Zeros (%) | 0.0% | 0.0% |
| Negative | 0 | 0 |
| Negative (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Quantile statistics
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0.42 | 0.83 |
| 5-th percentile | 4 | 4.2 |
| Q1 | 20 | 21 |
| median | 28 | 28 |
| Q3 | 39 | 36.5 |
| 95-th percentile | 57.8 | 52 |
| Maximum | 80 | 80 |
| Range | 79.58 | 79.17 |
| Interquartile range (IQR) | 19 | 15.5 |
Descriptive statistics
| Dataset A | Dataset B | |
|---|---|---|
| Standard deviation | 14.909219 | 13.626778 |
| Coefficient of variation (CV) | 0.50236045 | 0.46631695 |
| Kurtosis | -0.041439202 | 0.69039792 |
| Mean | 29.678329 | 29.222137 |
| Median Absolute Deviation (MAD) | 9 | 8 |
| Skewness | 0.29529938 | 0.40503663 |
| Sum | 10832.59 | 10666.08 |
| Variance | 222.2848 | 185.68908 |
| Monotonicity | Not monotonic | Not monotonic |
| Value | Count | Frequency (%) |
| 22 | 16 | 3.6% |
| 24 | 15 | 3.4% |
| 18 | 13 | 2.9% |
| 30 | 13 | 2.9% |
| 25 | 13 | 2.9% |
| 33 | 11 | 2.5% |
| 21 | 11 | 2.5% |
| 35 | 11 | 2.5% |
| 27 | 11 | 2.5% |
| 28 | 11 | 2.5% |
| Other values (66) | 240 | |
| (Missing) | 81 | 18.2% |
| Value | Count | Frequency (%) |
| 22 | 16 | 3.6% |
| 19 | 16 | 3.6% |
| 28 | 14 | 3.1% |
| 24 | 14 | 3.1% |
| 21 | 14 | 3.1% |
| 27 | 13 | 2.9% |
| 30 | 13 | 2.9% |
| 29 | 12 | 2.7% |
| 25 | 12 | 2.7% |
| 32 | 12 | 2.7% |
| Other values (63) | 229 | |
| (Missing) | 81 | 18.2% |
| Value | Count | Frequency (%) |
| 0.42 | 1 | 0.2% |
| 0.75 | 1 | 0.2% |
| 0.92 | 1 | 0.2% |
| 1 | 5 | |
| 2 | 5 | |
| 3 | 3 | |
| 4 | 7 | |
| 5 | 3 | |
| 7 | 2 | 0.4% |
| 8 | 4 |
| Value | Count | Frequency (%) |
| 0.83 | 2 | 0.4% |
| 0.92 | 1 | 0.2% |
| 1 | 6 | |
| 2 | 3 | |
| 3 | 4 | |
| 4 | 3 | |
| 5 | 3 | |
| 7 | 1 | 0.2% |
| 8 | 2 | 0.4% |
| 9 | 4 |
| Value | Count | Frequency (%) |
| 0.83 | 2 | 0.4% |
| 0.92 | 1 | 0.2% |
| 1 | 6 | |
| 2 | 3 | |
| 3 | 4 | |
| 4 | 3 | |
| 5 | 3 | |
| 7 | 1 | 0.2% |
| 8 | 2 | 0.4% |
| 9 | 4 |
| Value | Count | Frequency (%) |
| 0.42 | 1 | 0.2% |
| 0.75 | 1 | 0.2% |
| 0.92 | 1 | 0.2% |
| 1 | 5 | |
| 2 | 5 | |
| 3 | 3 | |
| 4 | 7 | |
| 5 | 3 | |
| 7 | 2 | 0.4% |
| 8 | 4 |
SibSp
Real number (ℝ)
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 6 | 7 |
| Distinct (%) | 1.3% | 1.6% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Infinite | 0 | 0 |
| Infinite (%) | 0.0% | 0.0% |
| Mean | 0.47757848 | 0.46636771 |
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0 | 0 |
| Maximum | 5 | 8 |
| Zeros | 307 | 307 |
| Zeros (%) | 68.8% | 68.8% |
| Negative | 0 | 0 |
| Negative (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Quantile statistics
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0 | 0 |
| 5-th percentile | 0 | 0 |
| Q1 | 0 | 0 |
| median | 0 | 0 |
| Q3 | 1 | 1 |
| 95-th percentile | 2 | 2 |
| Maximum | 5 | 8 |
| Range | 5 | 8 |
| Interquartile range (IQR) | 1 | 1 |
Descriptive statistics
| Dataset A | Dataset B | |
|---|---|---|
| Standard deviation | 0.92563466 | 0.93975315 |
| Coefficient of variation (CV) | 1.9381834 | 2.0150476 |
| Kurtosis | 8.1336245 | 15.366636 |
| Mean | 0.47757848 | 0.46636771 |
| Median Absolute Deviation (MAD) | 0 | 0 |
| Skewness | 2.7043685 | 3.3533975 |
| Sum | 213 | 208 |
| Variance | 0.85679952 | 0.88313599 |
| Monotonicity | Not monotonic | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 307 | |
| 1 | 104 | 23.3% |
| 2 | 14 | 3.1% |
| 4 | 10 | 2.2% |
| 3 | 7 | 1.6% |
| 5 | 4 | 0.9% |
| Value | Count | Frequency (%) |
| 0 | 307 | |
| 1 | 109 | 24.4% |
| 2 | 10 | 2.2% |
| 3 | 9 | 2.0% |
| 4 | 6 | 1.3% |
| 5 | 4 | 0.9% |
| 8 | 1 | 0.2% |
| Value | Count | Frequency (%) |
| 0 | 307 | |
| 1 | 104 | 23.3% |
| 2 | 14 | 3.1% |
| 3 | 7 | 1.6% |
| 4 | 10 | 2.2% |
| 5 | 4 | 0.9% |
| Value | Count | Frequency (%) |
| 0 | 307 | |
| 1 | 109 | 24.4% |
| 2 | 10 | 2.2% |
| 3 | 9 | 2.0% |
| 4 | 6 | 1.3% |
| 5 | 4 | 0.9% |
| 8 | 1 | 0.2% |
| Value | Count | Frequency (%) |
| 0 | 307 | |
| 1 | 109 | 24.4% |
| 2 | 10 | 2.2% |
| 3 | 9 | 2.0% |
| 4 | 6 | 1.3% |
| 5 | 4 | 0.9% |
| 8 | 1 | 0.2% |
| Value | Count | Frequency (%) |
| 0 | 307 | |
| 1 | 104 | 23.3% |
| 2 | 14 | 3.1% |
| 3 | 7 | 1.6% |
| 4 | 10 | 2.2% |
| 5 | 4 | 0.9% |
Parch
Real number (ℝ)
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 6 | 6 |
| Distinct (%) | 1.3% | 1.3% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Infinite | 0 | 0 |
| Infinite (%) | 0.0% | 0.0% |
| Mean | 0.39461883 | 0.34304933 |
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0 | 0 |
| Maximum | 5 | 6 |
| Zeros | 335 | 346 |
| Zeros (%) | 75.1% | 77.6% |
| Negative | 0 | 0 |
| Negative (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Quantile statistics
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0 | 0 |
| 5-th percentile | 0 | 0 |
| Q1 | 0 | 0 |
| median | 0 | 0 |
| Q3 | 0 | 0 |
| 95-th percentile | 2 | 2 |
| Maximum | 5 | 6 |
| Range | 5 | 6 |
| Interquartile range (IQR) | 0 | 0 |
Descriptive statistics
| Dataset A | Dataset B | |
|---|---|---|
| Standard deviation | 0.79400434 | 0.72903423 |
| Coefficient of variation (CV) | 2.0120792 | 2.1251586 |
| Kurtosis | 6.857774 | 10.095701 |
| Mean | 0.39461883 | 0.34304933 |
| Median Absolute Deviation (MAD) | 0 | 0 |
| Skewness | 2.392104 | 2.6541143 |
| Sum | 176 | 153 |
| Variance | 0.63044289 | 0.53149091 |
| Monotonicity | Not monotonic | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 335 | |
| 1 | 60 | 13.5% |
| 2 | 43 | 9.6% |
| 3 | 4 | 0.9% |
| 4 | 2 | 0.4% |
| 5 | 2 | 0.4% |
| Value | Count | Frequency (%) |
| 0 | 346 | |
| 1 | 55 | 12.3% |
| 2 | 41 | 9.2% |
| 3 | 2 | 0.4% |
| 4 | 1 | 0.2% |
| 6 | 1 | 0.2% |
| Value | Count | Frequency (%) |
| 0 | 335 | |
| 1 | 60 | 13.5% |
| 2 | 43 | 9.6% |
| 3 | 4 | 0.9% |
| 4 | 2 | 0.4% |
| 5 | 2 | 0.4% |
| Value | Count | Frequency (%) |
| 0 | 346 | |
| 1 | 55 | 12.3% |
| 2 | 41 | 9.2% |
| 3 | 2 | 0.4% |
| 4 | 1 | 0.2% |
| 6 | 1 | 0.2% |
| Value | Count | Frequency (%) |
| 0 | 346 | |
| 1 | 55 | 12.3% |
| 2 | 41 | 9.2% |
| 3 | 2 | 0.4% |
| 4 | 1 | 0.2% |
| 6 | 1 | 0.2% |
| Value | Count | Frequency (%) |
| 0 | 335 | |
| 1 | 60 | 13.5% |
| 2 | 43 | 9.6% |
| 3 | 4 | 0.9% |
| 4 | 2 | 0.4% |
| 5 | 2 | 0.4% |
Ticket
['Text', 'Text']
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 380 | 386 |
| Distinct (%) | 85.2% | 86.5% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 18 | 18 |
| Median length | 17 | 17 |
| Mean length | 6.8946188 | 6.8766816 |
| Min length | 3 | 3 |
Characters and Unicode
| Dataset A | Dataset B | |
|---|---|---|
| Total characters | 3075 | 3067 |
| Distinct characters | 35 | 31 |
| Distinct categories | 5 | 5 ? |
| Distinct scripts | 2 | 2 ? |
| Distinct blocks | 1 | 1 ? |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 329 | 338 ? |
| Unique (%) | 73.8% | 75.8% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | 24160 | 370377 |
| 2nd row | C.A. 37671 | 113800 |
| 3rd row | 345780 | PC 17612 |
| 4th row | PC 17558 | 370370 |
| 5th row | 28206 | 349244 |
| Value | Count | Frequency (%) |
| pc | 39 | 6.7% |
| c.a | 10 | 1.7% |
| a/5 | 8 | 1.4% |
| ston/o | 8 | 1.4% |
| 2 | 8 | 1.4% |
| ston/o2 | 6 | 1.0% |
| soton/o.q | 5 | 0.9% |
| ca | 5 | 0.9% |
| w./c | 5 | 0.9% |
| a/4 | 5 | 0.9% |
| Other values (402) | 480 |
| Value | Count | Frequency (%) |
| pc | 25 | 4.4% |
| c.a | 11 | 1.9% |
| ston/o | 10 | 1.8% |
| 2 | 10 | 1.8% |
| a/5 | 8 | 1.4% |
| ca | 7 | 1.2% |
| w./c | 7 | 1.2% |
| sc/paris | 6 | 1.1% |
| ston/o2 | 6 | 1.1% |
| 2144 | 5 | 0.9% |
| Other values (407) | 476 |
Most occurring characters
| Value | Count | Frequency (%) |
| 3 | 373 | |
| 1 | 368 | |
| 2 | 306 | |
| 7 | 247 | 8.0% |
| 4 | 225 | 7.3% |
| 6 | 222 | 7.2% |
| 5 | 196 | 6.4% |
| 0 | 194 | 6.3% |
| 9 | 156 | 5.1% |
| 8 | 137 | 4.5% |
| Other values (25) | 651 |
| Value | Count | Frequency (%) |
| 3 | 364 | |
| 1 | 356 | |
| 2 | 313 | |
| 7 | 245 | 8.0% |
| 4 | 233 | 7.6% |
| 0 | 221 | 7.2% |
| 6 | 196 | 6.4% |
| 5 | 186 | 6.1% |
| 9 | 157 | 5.1% |
| 8 | 142 | 4.6% |
| Other values (21) | 654 |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 2424 | |
| Uppercase Letter | 356 | 11.6% |
| Other Punctuation | 149 | 4.8% |
| Space Separator | 133 | 4.3% |
| Lowercase Letter | 13 | 0.4% |
| Value | Count | Frequency (%) |
| Decimal Number | 2413 | |
| Uppercase Letter | 351 | 11.4% |
| Other Punctuation | 165 | 5.4% |
| Space Separator | 125 | 4.1% |
| Lowercase Letter | 13 | 0.4% |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 3 | 373 | |
| 1 | 368 | |
| 2 | 306 | |
| 7 | 247 | |
| 4 | 225 | |
| 6 | 222 | |
| 5 | 196 | |
| 0 | 194 | |
| 9 | 156 | |
| 8 | 137 | 5.7% |
| Value | Count | Frequency (%) |
| 3 | 364 | |
| 1 | 356 | |
| 2 | 313 | |
| 7 | 245 | |
| 4 | 233 | |
| 0 | 221 | |
| 6 | 196 | |
| 5 | 186 | |
| 9 | 157 | |
| 8 | 142 | 5.9% |
Space Separator
| Value | Count | Frequency (%) |
| 133 |
| Value | Count | Frequency (%) |
| 125 |
Other Punctuation
| Value | Count | Frequency (%) |
| . | 92 | |
| / | 57 |
| Value | Count | Frequency (%) |
| . | 108 | |
| / | 57 |
Uppercase Letter
| Value | Count | Frequency (%) |
| C | 80 | |
| O | 59 | |
| P | 54 | |
| S | 42 | |
| A | 38 | |
| N | 24 | 6.7% |
| T | 23 | 6.5% |
| W | 9 | 2.5% |
| Q | 9 | 2.5% |
| F | 4 | 1.1% |
| Other values (6) | 14 | 3.9% |
| Value | Count | Frequency (%) |
| C | 72 | |
| O | 62 | |
| P | 49 | |
| S | 45 | |
| A | 35 | |
| N | 26 | 7.4% |
| T | 24 | 6.8% |
| W | 11 | 3.1% |
| Q | 7 | 2.0% |
| I | 6 | 1.7% |
| Other values (4) | 14 | 4.0% |
Lowercase Letter
| Value | Count | Frequency (%) |
| a | 4 | |
| s | 3 | |
| r | 2 | |
| i | 2 | |
| l | 1 | 7.7% |
| e | 1 | 7.7% |
| Value | Count | Frequency (%) |
| a | 4 | |
| i | 3 | |
| s | 3 | |
| r | 3 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 2706 | |
| Latin | 369 | 12.0% |
| Value | Count | Frequency (%) |
| Common | 2703 | |
| Latin | 364 | 11.9% |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 3 | 373 | |
| 1 | 368 | |
| 2 | 306 | |
| 7 | 247 | |
| 4 | 225 | |
| 6 | 222 | |
| 5 | 196 | |
| 0 | 194 | |
| 9 | 156 | |
| 8 | 137 | 5.1% |
| Other values (3) | 282 |
| Value | Count | Frequency (%) |
| 3 | 364 | |
| 1 | 356 | |
| 2 | 313 | |
| 7 | 245 | |
| 4 | 233 | |
| 0 | 221 | |
| 6 | 196 | |
| 5 | 186 | |
| 9 | 157 | |
| 8 | 142 | 5.3% |
| Other values (3) | 290 |
Latin
| Value | Count | Frequency (%) |
| C | 80 | |
| O | 59 | |
| P | 54 | |
| S | 42 | |
| A | 38 | |
| N | 24 | 6.5% |
| T | 23 | 6.2% |
| W | 9 | 2.4% |
| Q | 9 | 2.4% |
| a | 4 | 1.1% |
| Other values (12) | 27 | 7.3% |
| Value | Count | Frequency (%) |
| C | 72 | |
| O | 62 | |
| P | 49 | |
| S | 45 | |
| A | 35 | |
| N | 26 | 7.1% |
| T | 24 | 6.6% |
| W | 11 | 3.0% |
| Q | 7 | 1.9% |
| I | 6 | 1.6% |
| Other values (8) | 27 | 7.4% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 3075 |
| Value | Count | Frequency (%) |
| ASCII | 3067 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 3 | 373 | |
| 1 | 368 | |
| 2 | 306 | |
| 7 | 247 | 8.0% |
| 4 | 225 | 7.3% |
| 6 | 222 | 7.2% |
| 5 | 196 | 6.4% |
| 0 | 194 | 6.3% |
| 9 | 156 | 5.1% |
| 8 | 137 | 4.5% |
| Other values (25) | 651 |
| Value | Count | Frequency (%) |
| 3 | 364 | |
| 1 | 356 | |
| 2 | 313 | |
| 7 | 245 | 8.0% |
| 4 | 233 | 7.6% |
| 0 | 221 | 7.2% |
| 6 | 196 | 6.4% |
| 5 | 186 | 6.1% |
| 9 | 157 | 5.1% |
| 8 | 142 | 4.6% |
| Other values (21) | 654 |
Fare
Real number (ℝ)
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 183 | 183 |
| Distinct (%) | 41.0% | 41.0% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Infinite | 0 | 0 |
| Infinite (%) | 0.0% | 0.0% |
| Mean | 34.214321 | 31.734818 |
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0 | 0 |
| Maximum | 512.3292 | 512.3292 |
| Zeros | 5 | 7 |
| Zeros (%) | 1.1% | 1.6% |
| Negative | 0 | 0 |
| Negative (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Quantile statistics
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0 | 0 |
| 5-th percentile | 7.225 | 7.225 |
| Q1 | 7.8958 | 7.8958 |
| median | 14.5 | 14.25415 |
| Q3 | 32.221875 | 31 |
| 95-th percentile | 120 | 108.28125 |
| Maximum | 512.3292 | 512.3292 |
| Range | 512.3292 | 512.3292 |
| Interquartile range (IQR) | 24.326075 | 23.1042 |
Descriptive statistics
| Dataset A | Dataset B | |
|---|---|---|
| Standard deviation | 53.579964 | 48.442495 |
| Coefficient of variation (CV) | 1.5660099 | 1.5264778 |
| Kurtosis | 31.262231 | 28.365234 |
| Mean | 34.214321 | 31.734818 |
| Median Absolute Deviation (MAD) | 7.2604 | 6.52085 |
| Skewness | 4.6845301 | 4.4287539 |
| Sum | 15259.587 | 14153.729 |
| Variance | 2870.8126 | 2346.6754 |
| Monotonicity | Not monotonic | Not monotonic |
| Value | Count | Frequency (%) |
| 7.8958 | 23 | 5.2% |
| 13 | 20 | 4.5% |
| 8.05 | 18 | 4.0% |
| 7.75 | 15 | 3.4% |
| 7.25 | 10 | 2.2% |
| 26 | 10 | 2.2% |
| 7.925 | 9 | 2.0% |
| 7.2292 | 9 | 2.0% |
| 10.5 | 8 | 1.8% |
| 26.55 | 8 | 1.8% |
| Other values (173) | 316 |
| Value | Count | Frequency (%) |
| 7.75 | 23 | 5.2% |
| 13 | 23 | 5.2% |
| 26 | 19 | 4.3% |
| 8.05 | 18 | 4.0% |
| 7.8958 | 17 | 3.8% |
| 10.5 | 13 | 2.9% |
| 7.925 | 12 | 2.7% |
| 7.8542 | 8 | 1.8% |
| 8.6625 | 8 | 1.8% |
| 7.25 | 8 | 1.8% |
| Other values (173) | 297 |
| Value | Count | Frequency (%) |
| 0 | 5 | |
| 5 | 1 | 0.2% |
| 6.2375 | 1 | 0.2% |
| 6.4375 | 1 | 0.2% |
| 6.45 | 1 | 0.2% |
| 6.4958 | 2 | 0.4% |
| 6.75 | 1 | 0.2% |
| 6.975 | 1 | 0.2% |
| 7.05 | 6 | |
| 7.125 | 3 |
| Value | Count | Frequency (%) |
| 0 | 7 | |
| 5 | 1 | 0.2% |
| 6.4958 | 1 | 0.2% |
| 6.75 | 1 | 0.2% |
| 6.8583 | 1 | 0.2% |
| 6.975 | 1 | 0.2% |
| 7.05 | 4 | |
| 7.0542 | 1 | 0.2% |
| 7.125 | 4 | |
| 7.225 | 4 |
| Value | Count | Frequency (%) |
| 0 | 7 | |
| 5 | 1 | 0.2% |
| 6.4958 | 1 | 0.2% |
| 6.75 | 1 | 0.2% |
| 6.8583 | 1 | 0.2% |
| 6.975 | 1 | 0.2% |
| 7.05 | 4 | |
| 7.0542 | 1 | 0.2% |
| 7.125 | 4 | |
| 7.225 | 4 |
| Value | Count | Frequency (%) |
| 0 | 5 | |
| 5 | 1 | 0.2% |
| 6.2375 | 1 | 0.2% |
| 6.4375 | 1 | 0.2% |
| 6.45 | 1 | 0.2% |
| 6.4958 | 2 | 0.4% |
| 6.75 | 1 | 0.2% |
| 6.975 | 1 | 0.2% |
| 7.05 | 6 | |
| 7.125 | 3 |
Cabin
['Text', 'Text']
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 90 | 81 |
| Distinct (%) | 84.1% | 83.5% |
| Missing | 339 | 349 |
| Missing (%) | 76.0% | 78.3% |
| Memory size | 7.0 KiB | 7.0 KiB |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 15 | 15 |
| Median length | 3 | 3 |
| Mean length | 3.728972 | 3.6597938 |
| Min length | 1 | 1 |
Characters and Unicode
| Dataset A | Dataset B | |
|---|---|---|
| Total characters | 399 | 355 |
| Distinct characters | 18 | 19 |
| Distinct categories | 3 | 3 ? |
| Distinct scripts | 2 | 2 ? |
| Distinct blocks | 1 | 1 ? |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 73 | 66 ? |
| Unique (%) | 68.2% | 68.0% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | B3 | B42 |
| 2nd row | B58 B60 | B5 |
| 3rd row | E17 | B51 B53 B55 |
| 4th row | A26 | B39 |
| 5th row | E68 | D56 |
| Value | Count | Frequency (%) |
| f | 3 | 2.3% |
| f2 | 2 | 1.6% |
| b53 | 2 | 1.6% |
| f4 | 2 | 1.6% |
| b98 | 2 | 1.6% |
| b96 | 2 | 1.6% |
| d33 | 2 | 1.6% |
| b58 | 2 | 1.6% |
| c68 | 2 | 1.6% |
| c27 | 2 | 1.6% |
| Other values (93) | 108 |
| Value | Count | Frequency (%) |
| c23 | 3 | 2.6% |
| c27 | 3 | 2.6% |
| c25 | 3 | 2.6% |
| b49 | 2 | 1.8% |
| e33 | 2 | 1.8% |
| e44 | 2 | 1.8% |
| f2 | 2 | 1.8% |
| b77 | 2 | 1.8% |
| e101 | 2 | 1.8% |
| c83 | 2 | 1.8% |
| Other values (83) | 91 |
Most occurring characters
| Value | Count | Frequency (%) |
| C | 39 | 9.8% |
| 2 | 37 | 9.3% |
| 3 | 36 | 9.0% |
| B | 34 | 8.5% |
| 1 | 30 | 7.5% |
| 6 | 24 | 6.0% |
| 8 | 23 | 5.8% |
| 4 | 22 | 5.5% |
| 5 | 22 | 5.5% |
| 22 | 5.5% | |
| Other values (8) | 110 |
| Value | Count | Frequency (%) |
| C | 38 | |
| 2 | 37 | |
| 1 | 32 | 9.0% |
| 3 | 31 | 8.7% |
| B | 28 | 7.9% |
| 6 | 24 | 6.8% |
| 5 | 20 | 5.6% |
| 7 | 20 | 5.6% |
| 4 | 20 | 5.6% |
| E | 19 | 5.4% |
| Other values (9) | 86 |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 248 | |
| Uppercase Letter | 129 | |
| Space Separator | 22 | 5.5% |
| Value | Count | Frequency (%) |
| Decimal Number | 224 | |
| Uppercase Letter | 114 | |
| Space Separator | 17 | 4.8% |
Most frequent character per category
Uppercase Letter
| Value | Count | Frequency (%) |
| C | 39 | |
| B | 34 | |
| D | 20 | |
| E | 13 | 10.1% |
| A | 10 | 7.8% |
| F | 9 | 7.0% |
| G | 4 | 3.1% |
| Value | Count | Frequency (%) |
| C | 38 | |
| B | 28 | |
| E | 19 | |
| D | 11 | 9.6% |
| A | 9 | 7.9% |
| F | 6 | 5.3% |
| G | 2 | 1.8% |
| T | 1 | 0.9% |
Decimal Number
| Value | Count | Frequency (%) |
| 2 | 37 | |
| 3 | 36 | |
| 1 | 30 | |
| 6 | 24 | |
| 8 | 23 | |
| 4 | 22 | |
| 5 | 22 | |
| 9 | 21 | |
| 7 | 17 | |
| 0 | 16 |
| Value | Count | Frequency (%) |
| 2 | 37 | |
| 1 | 32 | |
| 3 | 31 | |
| 6 | 24 | |
| 5 | 20 | |
| 7 | 20 | |
| 4 | 20 | |
| 9 | 15 | |
| 0 | 14 | 6.2% |
| 8 | 11 | 4.9% |
Space Separator
| Value | Count | Frequency (%) |
| 22 |
| Value | Count | Frequency (%) |
| 17 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 270 | |
| Latin | 129 |
| Value | Count | Frequency (%) |
| Common | 241 | |
| Latin | 114 |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| C | 39 | |
| B | 34 | |
| D | 20 | |
| E | 13 | 10.1% |
| A | 10 | 7.8% |
| F | 9 | 7.0% |
| G | 4 | 3.1% |
| Value | Count | Frequency (%) |
| C | 38 | |
| B | 28 | |
| E | 19 | |
| D | 11 | 9.6% |
| A | 9 | 7.9% |
| F | 6 | 5.3% |
| G | 2 | 1.8% |
| T | 1 | 0.9% |
Common
| Value | Count | Frequency (%) |
| 2 | 37 | |
| 3 | 36 | |
| 1 | 30 | |
| 6 | 24 | |
| 8 | 23 | |
| 4 | 22 | |
| 5 | 22 | |
| 22 | ||
| 9 | 21 | |
| 7 | 17 |
| Value | Count | Frequency (%) |
| 2 | 37 | |
| 1 | 32 | |
| 3 | 31 | |
| 6 | 24 | |
| 5 | 20 | |
| 7 | 20 | |
| 4 | 20 | |
| 17 | ||
| 9 | 15 | |
| 0 | 14 | 5.8% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 399 |
| Value | Count | Frequency (%) |
| ASCII | 355 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| C | 39 | 9.8% |
| 2 | 37 | 9.3% |
| 3 | 36 | 9.0% |
| B | 34 | 8.5% |
| 1 | 30 | 7.5% |
| 6 | 24 | 6.0% |
| 8 | 23 | 5.8% |
| 4 | 22 | 5.5% |
| 5 | 22 | 5.5% |
| 22 | 5.5% | |
| Other values (8) | 110 |
| Value | Count | Frequency (%) |
| C | 38 | |
| 2 | 37 | |
| 1 | 32 | 9.0% |
| 3 | 31 | 8.7% |
| B | 28 | 7.9% |
| 6 | 24 | 6.8% |
| 5 | 20 | 5.6% |
| 7 | 20 | 5.6% |
| 4 | 20 | 5.6% |
| E | 19 | 5.4% |
| Other values (9) | 86 |
Embarked
Categorical
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 3 | 3 |
| Distinct (%) | 0.7% | 0.7% |
| Missing | 0 | 1 |
| Missing (%) | 0.0% | 0.2% |
| Memory size | 7.0 KiB | 7.0 KiB |
| S | |
|---|---|
| C | |
| Q |
| S | |
|---|---|
| C | |
| Q |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 1 | 1 |
| Median length | 1 | 1 |
| Mean length | 1 | 1 |
| Min length | 1 | 1 |
Characters and Unicode
| Dataset A | Dataset B | |
|---|---|---|
| Total characters | 446 | 445 |
| Distinct characters | 3 | 3 |
| Distinct categories | 1 | 1 ? |
| Distinct scripts | 1 | 1 ? |
| Distinct blocks | 1 | 1 ? |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 0 | 0 ? |
| Unique (%) | 0.0% | 0.0% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | S | Q |
| 2nd row | S | S |
| 3rd row | S | C |
| 4th row | C | Q |
| 5th row | S | S |
Common Values
| Value | Count | Frequency (%) |
| S | 310 | |
| C | 100 | 22.4% |
| Q | 36 | 8.1% |
| Value | Count | Frequency (%) |
| S | 326 | |
| C | 79 | 17.7% |
| Q | 40 | 9.0% |
| (Missing) | 1 | 0.2% |
Length
Common Values (Plot)
Dataset A
Dataset B
| Value | Count | Frequency (%) |
| s | 310 | |
| c | 100 | 22.4% |
| q | 36 | 8.1% |
| Value | Count | Frequency (%) |
| s | 326 | |
| c | 79 | 17.8% |
| q | 40 | 9.0% |
Most occurring characters
| Value | Count | Frequency (%) |
| S | 310 | |
| C | 100 | 22.4% |
| Q | 36 | 8.1% |
| Value | Count | Frequency (%) |
| S | 326 | |
| C | 79 | 17.8% |
| Q | 40 | 9.0% |
Most occurring categories
| Value | Count | Frequency (%) |
| Uppercase Letter | 446 |
| Value | Count | Frequency (%) |
| Uppercase Letter | 445 |
Most frequent character per category
Uppercase Letter
| Value | Count | Frequency (%) |
| S | 310 | |
| C | 100 | 22.4% |
| Q | 36 | 8.1% |
| Value | Count | Frequency (%) |
| S | 326 | |
| C | 79 | 17.8% |
| Q | 40 | 9.0% |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 446 |
| Value | Count | Frequency (%) |
| Latin | 445 |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| S | 310 | |
| C | 100 | 22.4% |
| Q | 36 | 8.1% |
| Value | Count | Frequency (%) |
| S | 326 | |
| C | 79 | 17.8% |
| Q | 40 | 9.0% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 446 |
| Value | Count | Frequency (%) |
| ASCII | 445 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| S | 310 | |
| C | 100 | 22.4% |
| Q | 36 | 8.1% |
| Value | Count | Frequency (%) |
| S | 326 | |
| C | 79 | 17.8% |
| Q | 40 | 9.0% |
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
| PassengerId | Age | SibSp | Parch | Fare | Survived | Pclass | Sex | Embarked | |
|---|---|---|---|---|---|---|---|---|---|
| PassengerId | 1.000 | 0.033 | -0.025 | 0.039 | 0.011 | 0.097 | 0.066 | 0.105 | 0.000 |
| Age | 0.033 | 1.000 | -0.156 | -0.251 | 0.185 | 0.155 | 0.310 | 0.117 | 0.092 |
| SibSp | -0.025 | -0.156 | 1.000 | 0.437 | 0.463 | 0.217 | 0.154 | 0.203 | 0.154 |
| Parch | 0.039 | -0.251 | 0.437 | 1.000 | 0.422 | 0.133 | 0.000 | 0.215 | 0.089 |
| Fare | 0.011 | 0.185 | 0.463 | 0.422 | 1.000 | 0.335 | 0.493 | 0.199 | 0.186 |
| Survived | 0.097 | 0.155 | 0.217 | 0.133 | 0.335 | 1.000 | 0.368 | 0.538 | 0.191 |
| Pclass | 0.066 | 0.310 | 0.154 | 0.000 | 0.493 | 0.368 | 1.000 | 0.144 | 0.240 |
| Sex | 0.105 | 0.117 | 0.203 | 0.215 | 0.199 | 0.538 | 0.144 | 1.000 | 0.083 |
| Embarked | 0.000 | 0.092 | 0.154 | 0.089 | 0.186 | 0.191 | 0.240 | 0.083 | 1.000 |
Dataset B
| PassengerId | Age | SibSp | Parch | Fare | Survived | Pclass | Sex | Embarked | |
|---|---|---|---|---|---|---|---|---|---|
| PassengerId | 1.000 | 0.071 | -0.079 | -0.017 | -0.038 | 0.139 | 0.029 | 0.000 | 0.000 |
| Age | 0.071 | 1.000 | -0.223 | -0.280 | 0.071 | 0.173 | 0.217 | 0.088 | 0.049 |
| SibSp | -0.079 | -0.223 | 1.000 | 0.434 | 0.464 | 0.156 | 0.140 | 0.150 | 0.084 |
| Parch | -0.017 | -0.280 | 0.434 | 1.000 | 0.407 | 0.118 | 0.000 | 0.146 | 0.023 |
| Fare | -0.038 | 0.071 | 0.464 | 0.407 | 1.000 | 0.249 | 0.499 | 0.170 | 0.166 |
| Survived | 0.139 | 0.173 | 0.156 | 0.118 | 0.249 | 1.000 | 0.322 | 0.531 | 0.168 |
| Pclass | 0.029 | 0.217 | 0.140 | 0.000 | 0.499 | 0.322 | 1.000 | 0.118 | 0.276 |
| Sex | 0.000 | 0.088 | 0.150 | 0.146 | 0.170 | 0.531 | 0.118 | 1.000 | 0.168 |
| Embarked | 0.000 | 0.049 | 0.084 | 0.023 | 0.166 | 0.168 | 0.276 | 0.168 | 1.000 |
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 779 | 780 | 1 | 1 | Robert, Mrs. Edward Scott (Elisabeth Walton McMillan) | female | 43.0 | 0 | 1 | 24160 | 211.3375 | B3 | S |
| 348 | 349 | 1 | 3 | Coutts, Master. William Loch "William" | male | 3.0 | 1 | 1 | C.A. 37671 | 15.9000 | NaN | S |
| 752 | 753 | 0 | 3 | Vande Velde, Mr. Johannes Joseph | male | 33.0 | 0 | 0 | 345780 | 9.5000 | NaN | S |
| 299 | 300 | 1 | 1 | Baxter, Mrs. James (Helene DeLaudeniere Chaput) | female | 50.0 | 0 | 1 | PC 17558 | 247.5208 | B58 B60 | C |
| 812 | 813 | 0 | 2 | Slemen, Mr. Richard James | male | 35.0 | 0 | 0 | 28206 | 10.5000 | NaN | S |
| 857 | 858 | 1 | 1 | Daly, Mr. Peter Denis | male | 51.0 | 0 | 0 | 113055 | 26.5500 | E17 | S |
| 126 | 127 | 0 | 3 | McMahon, Mr. Martin | male | NaN | 0 | 0 | 370372 | 7.7500 | NaN | Q |
| 647 | 648 | 1 | 1 | Simonius-Blumer, Col. Oberst Alfons | male | 56.0 | 0 | 0 | 13213 | 35.5000 | A26 | C |
| 289 | 290 | 1 | 3 | Connolly, Miss. Kate | female | 22.0 | 0 | 0 | 370373 | 7.7500 | NaN | Q |
| 864 | 865 | 0 | 2 | Gill, Mr. John William | male | 24.0 | 0 | 0 | 233866 | 13.0000 | NaN | S |
Dataset B
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 613 | 614 | 0 | 3 | Horgan, Mr. John | male | NaN | 0 | 0 | 370377 | 7.7500 | NaN | Q |
| 694 | 695 | 0 | 1 | Weir, Col. John | male | 60.0 | 0 | 0 | 113800 | 26.5500 | NaN | S |
| 295 | 296 | 0 | 1 | Lewy, Mr. Ervin G | male | NaN | 0 | 0 | PC 17612 | 27.7208 | NaN | C |
| 198 | 199 | 1 | 3 | Madigan, Miss. Margaret "Maggie" | female | NaN | 0 | 0 | 370370 | 7.7500 | NaN | Q |
| 797 | 798 | 1 | 3 | Osman, Mrs. Mara | female | 31.0 | 0 | 0 | 349244 | 8.6833 | NaN | S |
| 114 | 115 | 0 | 3 | Attalah, Miss. Malake | female | 17.0 | 0 | 0 | 2627 | 14.4583 | NaN | C |
| 887 | 888 | 1 | 1 | Graham, Miss. Margaret Edith | female | 19.0 | 0 | 0 | 112053 | 30.0000 | B42 | S |
| 69 | 70 | 0 | 3 | Kink, Mr. Vincenz | male | 26.0 | 2 | 0 | 315151 | 8.6625 | NaN | S |
| 730 | 731 | 1 | 1 | Allen, Miss. Elisabeth Walton | female | 29.0 | 0 | 0 | 24160 | 211.3375 | B5 | S |
| 381 | 382 | 1 | 3 | Nakid, Miss. Maria ("Mary") | female | 1.0 | 0 | 2 | 2653 | 15.7417 | NaN | C |
Dataset A
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 760 | 761 | 0 | 3 | Garfirth, Mr. John | male | NaN | 0 | 0 | 358585 | 14.5000 | NaN | S |
| 626 | 627 | 0 | 2 | Kirkland, Rev. Charles Leonard | male | 57.0 | 0 | 0 | 219533 | 12.3500 | NaN | Q |
| 549 | 550 | 1 | 2 | Davies, Master. John Morgan Jr | male | 8.0 | 1 | 1 | C.A. 33112 | 36.7500 | NaN | S |
| 341 | 342 | 1 | 1 | Fortune, Miss. Alice Elizabeth | female | 24.0 | 3 | 2 | 19950 | 263.0000 | C23 C25 C27 | S |
| 481 | 482 | 0 | 2 | Frost, Mr. Anthony Wood "Archie" | male | NaN | 0 | 0 | 239854 | 0.0000 | NaN | S |
| 148 | 149 | 0 | 2 | Navratil, Mr. Michel ("Louis M Hoffman") | male | 36.5 | 0 | 2 | 230080 | 26.0000 | F2 | S |
| 435 | 436 | 1 | 1 | Carter, Miss. Lucile Polk | female | 14.0 | 1 | 2 | 113760 | 120.0000 | B96 B98 | S |
| 826 | 827 | 0 | 3 | Lam, Mr. Len | male | NaN | 0 | 0 | 1601 | 56.4958 | NaN | S |
| 287 | 288 | 0 | 3 | Naidenoff, Mr. Penko | male | 22.0 | 0 | 0 | 349206 | 7.8958 | NaN | S |
| 470 | 471 | 0 | 3 | Keefe, Mr. Arthur | male | NaN | 0 | 0 | 323592 | 7.2500 | NaN | S |
Dataset B
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 762 | 763 | 1 | 3 | Barah, Mr. Hanna Assi | male | 20.0 | 0 | 0 | 2663 | 7.2292 | NaN | C |
| 573 | 574 | 1 | 3 | Kelly, Miss. Mary | female | NaN | 0 | 0 | 14312 | 7.7500 | NaN | Q |
| 788 | 789 | 1 | 3 | Dean, Master. Bertram Vere | male | 1.0 | 1 | 2 | C.A. 2315 | 20.5750 | NaN | S |
| 391 | 392 | 1 | 3 | Jansson, Mr. Carl Olof | male | 21.0 | 0 | 0 | 350034 | 7.7958 | NaN | S |
| 563 | 564 | 0 | 3 | Simmons, Mr. John | male | NaN | 0 | 0 | SOTON/OQ 392082 | 8.0500 | NaN | S |
| 369 | 370 | 1 | 1 | Aubart, Mme. Leontine Pauline | female | 24.0 | 0 | 0 | PC 17477 | 69.3000 | B35 | C |
| 44 | 45 | 1 | 3 | Devaney, Miss. Margaret Delia | female | 19.0 | 0 | 0 | 330958 | 7.8792 | NaN | Q |
| 142 | 143 | 1 | 3 | Hakkarainen, Mrs. Pekka Pietari (Elin Matilda Dolck) | female | 24.0 | 1 | 0 | STON/O2. 3101279 | 15.8500 | NaN | S |
| 886 | 887 | 0 | 2 | Montvila, Rev. Juozas | male | 27.0 | 0 | 0 | 211536 | 13.0000 | NaN | S |
| 341 | 342 | 1 | 1 | Fortune, Miss. Alice Elizabeth | female | 24.0 | 3 | 2 | 19950 | 263.0000 | C23 C25 C27 | S |
Dataset A
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | # duplicates | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Dataset does not contain duplicate rows. | |||||||||||||
Dataset B
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | # duplicates | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Dataset does not contain duplicate rows. | |||||||||||||